Fast Integral Histogram Computations on GPU for Real-Time Video Analytics

نویسندگان

  • Mahdieh Poostchi
  • Kannappan Palaniappan
  • Da Li
  • Michela Becchi
  • Filiz Bunyak
  • Guna Seetharaman
چکیده

In many Multimedia content analytics frameworks feature likelihood maps represented as histograms play a critical role in the overall algorithm. Integral histograms provide an efficient computational framework for extracting multi-scale histogram-based regional descriptors in constant time which are considered as the principle building blocks of many video content analytics frameworks. We evaluate four different mappings of the integral histogram computation onto Graphics Processing Units (GPUs) using different kernel optimization strategies. Our kernels perform cumulative sums on row and column histograms in a cross-weave or wavefront scan order, use different data organization and scheduling methods that is shown to critically affect utilization of GPU resources (cores and shared memory). Tiling the 3-D array into smaller regular data blocks significantly speeds up the efficiency of the computation compared to a strip-based organization. The tiled integral histogram using a diagonal wavefront scan has the best performance of about 300.4 frames/sec for 640 × 480 images and 32 bins with a speedup factor of about 120 using GTX Titan X graphics card compared to a single threaded sequential CPU implementation. Double-buffering has been exploited to overlap computation and communication across sequence of images. Mapping integral histogram bins computations onto multiple GPUs enables us to process 32 giga bytes integral histogram data (of 64MB Image and 128 bins) with a frame rate of 0.73 Hz and speedup factor of 153X over single-threaded CPU implementation and the speedup of 45X over 16-threaded CPU implementation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Spatial Pyramid Context-Aware Moving Object Detection and Tracking for Full Motion Video and Wide Aerial Motion Imagery

A robust and fast automatic moving object detection and tracking system is essential to characterize target object and extract spatial and temporal information for different functionalities including video surveillance systems, urban traffic monitoring and navigation, robotic, medical imaging, etc. A reliable detecting and tracking system is required to generalize across huge variations in obje...

متن کامل

Fast GPU-Based Influence Maximization Within Finite Deadlines via Node-Level Parallelism

Influence maximization in the continuous-time domain is a prevalent topic in social media analytics. It relates to the problem of identifying those individuals in a social network, whose endorsement of an opinion will maximize the number of expected follow-ups within a finite time window. This work presents a novel GPU-accelerated algorithm that enables node-parallel estimation of influence spr...

متن کامل

Tree-structured image difference for fast histogram and distance between histograms computation

In this paper we present a new method for fast histogram computing and its extension to bin to bin histogram distance computing. The idea consists in using the information of spatial differences between images, or between regions of images (a current one and a reference one), and encoding it into a specific data structure: a tree. The histogram of the current image or of one of its regions is t...

متن کامل

fastHOG - a real-time GPU implementation of HOG

We introduce a parallel implementation of the histogram of oriented gradients algorithm for object detection. Our implementation uses the GPU and the NVIDIA CUDA framework. We achieve speedups of over 67x from the standard sequential code, using a single video card. Furthermore it supports multiple video cards so speedups of 120x or more can be achieved. This allows us to achieve real-time perf...

متن کامل

Efficient GPU Implementation of the Integral Histogram

The integral histogram for images is an efficient preprocessing method for speeding up diverse computer vision algorithms including object detection, appearance-based tracking, recognition and segmentation. Our proposed Graphics Processing Unit (GPU) implementation uses parallel prefix sums on row and column histograms in a cross-weave scan with high GPU utilization and communication-aware data...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1711.01919  شماره 

صفحات  -

تاریخ انتشار 2017